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[57] ABSTRACT 

A system for automatically storing a message comprises a 
first telecommunication device for transmitting and/or 
receiving an audio message to a second telecommunication 
device. Both devices are coupled through a first network for 
transmitting and receiving telephone calls. A first data pro- 
cessing system with a speaker dependent data base and a 
second data processing system are provided, both are 
coupled through a second network for data communication. 
The first data processing system is coupled with the first 
telecommunication device and the second data processing 
system is coupled with the second telecommunication 
device. At least the first data processing system has a speech 
recognition system, the second telecommunication device 
has a control unit which generates a signal after receiving the 
audio message from said first telecommunication device and 
has a compare unit. Upon generating the signal the second 
data processing system converts the audio message into 
digital data and the compare unit compares the size of the 
digital data with the size of the data base, whichever is 
smaller is sent to the other data processing system, which 
converts the digital data into a text file. 

2 Claims, 4 Drawing Sheets 
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SPEECH AND TEXT MESSAGING SYSTEM The data processing system can then process this tcxi file 

WITH DISTRIBUTED SPEECH easily, e.g., in a message management piogram. If lhe speech 

RECOGNITION AND SPEAKER DATABASE recognition system is speaker dependent, the speech recog- 

TRANSFERS nition system has to be adapted to the respective speaker/ 

< caller. In this case, every telecommunication device, e.g., a 

BACKGROUND OF THE INVENTION " , e[ephone xtt is par , of or connected to a respective data 

The present invention relates to a system and a method of processing system, such as a multi-media personal corn- 
converting and storing an audio message into a text file. puter. Each system is equipped with a speaker dependent 
Storing audio messages is well known, for example from speech recognition system having a data base or a parameter 
automatic answering machines. Automatic answering i 0 set which has been adapted individually to the respective 
machines are convenient for leaving a message for someone owner's voice. This individually different data base or 
who is not available at a certain lime or for a certain period parameter set is then transmitted from the respective caller's 
of time. Data processing systems, such as a personal data processing device, via a data communication network, 
computer, are nowadays equipped with modems and sound to the called person's data processing system which then 
units which are capable of converting such a system into a 15 converts the audio message into a text file, 
multi-media telecommunication device. This multi-media In another embodiment, a system for automatically stor- 
telecommunieation device could be a telephone, answering ing a message comprises a first telecommunication device 
machine, fax, network component or network peripheral, for transmitting an audio message to and for receiving an 
etc. audio message from a second telecommunication device. 

While such a system is very convenient for regular use, it 2u J*°lh devices are coupled to a telecommunication network, 
needs a certain amount of memory for storing the audio and a first and second data processing system are coupled to 
messages, and it is difficult to organize and manage a larger a data communication network. At a minimum the first data 
amount of audio messagesfe.g., in a data base) because the processing system has a speech recognition system, whereby 
content of the message cannot be visually recognized. Also, 'be second telecommunication device has a control unit 
such a system is not useful for any person with a hearing 2 _s which generates a signal after receiving an audio message 
impairment. Call centers that handle a large amount of audio from the first telecommunication device. In such a system, 
messages, e.g., orders, often need a written text instead of a the speech recognition system is also preferably but not 
spoken message. In many cases, these call centers monitor necessarily speaker dependent. 

certain calls and store them. A text file which contains the Upon reception of the signal, the second data processing 

content of the call can also be helpful. Therefore, for many 30 system converts the audio message into a digital data signal 
uses, a written text which can be visualized is needed rather and transmits this digital data signal, via the data commu- 
than an audio message. nication network, to the first data processing system. The 

SUMMARY OF HIE INVENTION JP cech recognition system of the first data processing system 

then converts the digital data into a text hie and transmits 

Thus, it is an object of the present invention to provide a JS this text file back to the second data processing system, 
system which converts and stores an audio message into a Such a sys , em can further uprise a comparing unit 
text file. which compares the size of the digitized audio message with 

According to the invention, a system for automatically the size of the speech recognition data base or parameter set. 
storing a message comprises a telecommunication device for Whichever file is greater remains at that location, and the 
transmitting and receiving an audio message coupled to a 40 0 iher file will be transferred to this location by means of the 
telecommunication network. It further comprises a data d ala communication network. The audio message is there- 
processing system including a speech recognition system fore either converted at the data processing system of the 
connected to the telecommunication device. The telecom- ca n c d party with the calling party's data base or parameter 
munication device has a control unit which transfers the being transferred, or at the data processing system of the 

audio message to the data processing system. The data 45 calling party with the digitized audio message being trans- 
processing system then converts said audio message into a f erred data tran sfer cost will be minimized, 
digital signal. Further, the system has a memory to store said A fimher memo(1 accorc | ing t0 lhe presc[U invention 
digital signal. The speech recognition system converts the comprises the steps of: calling the second telecom inunica- 
digital signal into a text file and stores it in its memory. The tjon devjce via the first telecommunication device; then 
system may have indicating means, such as a signal lamp, to 50 transmitting a signal from the 2nd telecommunication device 
indicate to the user that a message has been received. t0 thc fin}| ^communication device requesting thai the 

Such a system can be preferably implemented in a multi- au[ |io message will be transferred to the first data processing 
media computer system, such as a personal computer with system; then converting the audio message into a text file by 
speech recognition system and voice-modem capabilities. me ans of the speech recognition system; and finally trans- 
The converted audio message can be stored and managed in S5 (erring the text file to the second data processing system via 
a message data base or a message managing system as a text t hc data network. After transmission of the signal which 
file. This is advantageous as a user can easily select a indicates that the second telecommunication device is busy 
message out of a plurality of messages when all messages or mc ca i| ctJ parly cannot answcr at this moment, either thc 
are in a visualized text form. For example, the user can select ca [] ec i data processing system or the calling data processing 
lhe beginning words, names, etc. of the respective messages. m system can generate an aulomated answer. This automated 

The speech recognition system can be speaker dependent answer can be the same as the automated answers already 
or speaker independent. The advantage of a speaker depen- provided by commercially available answering machines. If 
dent recognition system is that it usually provides a large the first data processing system generates thc answer, thc 
vocabulary, whereas when using a speaker independent connection through the telecommunications network can be 
system only a smaller number of words can be recognized. 65 interrupted. The advantage of this method is, that the actual 

If lhe speech recognition syslcm is speaker independent, connection lime through ihc telecommunication network 
it receives the audio message and converts il into a text file. can be kept very short, typically only a few seconds. Thus, 
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telephone costs in particular for long distance calls, are less 
expensive because the Juration of the answer start message 
and the duration of recording the message is not part of the 
actual telephone call. 

Another method according to the present invention com- 
prises the steps of: calling a telecommunication device; 
storing the audio message in the data processing system 
associated with the called telecommunication device; trans- 
ferring the audio message to the calling data processing 
system via said data network; converting the audio message 
into a text file by means of a speech recognition system; and 
finally transferring the text file to the called data processing 
system via said data network. 

A further method according to the present invention 
comprises the steps of: calling a telecommunication device 
via a telecommunication network; sending a signal to the 
calling data processing system indicating that the speech 
recognition data base or parameter set of the calling data 
processing system will be transferred to the called data 
processing system; and then converting the audio message 
into a text file by means of the speech recognition system of 
the called data processing system. 

All of the above-described methods can easily be per- 
formed by a multi-media personal computer which includes 
a speech recognition system and which is connected to a 
telecommunication network, e.g., by means of a voice- 
modem, and to a data communication network, such as the 
INTERNET, or by means of a local area network or the same 
telecommunication network. The whole system can also be 
integrated into a telecommunication device with a comput- 
erlike display and keyboard. 

The methods according to the present invention all reduce 
the costs associated with using a telecommunication net- 



to 



20 



30 



work 2. This second telecommunication device 3 is further 
connected to a data processing system 4, such as personal 
compiler, which includes or which is connected to a speech 
recognition system 5. The speech recognition system 5 
comprises a speech recognition data base 6 and is speaker 
independent. The term data base is used throughout the 
following description for a unit that can store any kind of 
parameter set or data which is necessary to run a speaker 
dependent or speaker independent speech recognition sys- 
tem. This unit can be a separate memory device connected 
to the speech recognition system or it can be incorporated 
into the speech recognition system. Telecommunication 
device 3, data processing system 4 and speech recognition 
system 5 can be incorporated in a multi-media personal 
computer as described above . 

If the user does not answer a telephone call made from the 
telecommunication device 1 via network 2 to telecommu- 
nication device 3, data processing system 4 automatically 
provides an answering message which includes the request 
to leave a message on the system. This is done in a manner 
known from standard answering machines. Trie answering 
message can be a synthesized voice message or a digitized 
spoken message. 

FIG. 2 shows portions of an embodiment of an integrated 
system. The analog voice message 12 is fed to an analog/ 
digital converter 8 which converts it to digital data. The 
analog signal can also be converted by a codec in the 
voice-modem (not shown). This digital data will be stored in 
a memory 9. A speech recognition system 10 is provided 
which is also connected to memory 9. The integrated system 
is controlled by a CPU 11 which is connected to all elements 
in this system. For permanent storing of the text file, a hard 
disk 20 is provided which is coupled with the system 7. 
After receiving and converting the audio message, it is 



work. For example, access to the INTERNET generates only 35 stored as digital data in memory 9 of the system 7 or 4. Then. 



local telephone costs. The transmission of a text file through 
a data communication network also does not have to be 
synchronized with the actual telephone call and can be 
performed at any time after the call. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a first embodiment of a 
system according to the present invention, 

FIG. 2 is a block diagram of a portion of an integrated 
system according to the invention, 

FIG. 3 is a block diagram of a second embodiment of a 
system according to the present invention, 

FIG. 4 a block diagram of a third embodiment of a system 
according to the present invention, 

H G. 5 is a (low chart showing a first method according to 
(he present invention, 

FIG. 6 is a flow chart showing a second method according 
to the present invention, and 

FIG. 7 is a flow chart showing a third method according 
to the present invention. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

FIG. 1 shows a block diagram of a first embodiment of the 
invention. Telecommunication device I can be, for example, 
a telephone set or a multi-media personal computer 
equipped with a voice-modem, etc. The telephone set 1 is 
connected to a common telecommunication network 2, such 
as an ISDN network or n standard analog network. A second 
telecommunication device 3 is also connected to this net- 



speech recognition system 5 or 10 converts this digitized 
audio message into a text file which will be permanently 
stored, for example on a hard disk 20. The speech recogni- 
tion system can be any system known in the an. For 
40 example, U.S. Pat. No. 5,293,584 and U.S. Pat. No. 4,799, 
262 disclose different available speech recognition systems. 
The received text files can then be presented to the user in 
a way similar to an e-mail system or they can be stored in 
a message data base for further processing. The telecom- 
munication device, e.g., a telephone set, can be equipped 
with a indicator lamp 3fl, such as a LED, which indicates 
that a new message has been received. The indicator lamp 
can also be incorporated in the data processing system 4. 
Such a system is particularly useful for persons with hearing 
impairment or for anyone who needs information in a 
written visualized form. 

FIG. 3 shows another example of an embodiment of the 
present invention. Two telecommunication devices, such as 
telephone sets I and 3, are coupled through a telecommu- 
nication network 2, such as an ISDN network or a standard 
analog network. Data processing systems, such as personal 
computers (PC) 13 and 4, arc associated with telephone sets 
1 and 3. PC 13 is connected to telephone set 1, whereas PC 
4 may or may not be connected to telephone set 3. This is 
indicated by dotted lines. Telephone sets 1 and 3 can be 
equipped with or can be connected to control units 17 and 
18, respectively. These control units 17 and 18 may also be 
incorporated in PC's 13 and PC 4 or their function may be 
provided by PC's 13 and 4 or can be part of terminals I and 
3. PC's 13 and 4 arc coupled to each other through a data 
communication network 14, such as the INTERN ET, r.n 
ISDN-network or a LAN, etc. This coupling can also be 
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done through the same telecommunication network 2 by 
means of modems, etc. PC 13 is connected with or equipped 
with a speech recognition system 15 which in turn is 
connected to or includes a speech recognition data hase 16. 
PC 4 may also have such a speech recognition system 5 and s 
database 6. 

FIG. 5 shows a flow chart of the function of thus system. 
In the following example user A with telephone set 1 and PC 
13 tries to call user B with telephone set 3 and data 
processing unit (PC) 4 (FIG. 5, step 30), but user B cannot 10 
answer the call. In this case, control unit 17 generates a 
control signal which is sent back to telephone set I (FIG. 5, 
step 31). In a digital ISDN network, this signal will be a 
digital control signal, whereas in an analog network such a 
digital control signal has to be converted , e.g., through a 15 
modem, which is part of control unit 17 or PC 4. This control 
signal contains data about the called party, such as name, 
e-mail address, etc. Before sending this signal, control unit 
17 might send an answer message to user A indicating that 
the calling party should leave a message. Instead of gener- 20 
ating this answer message with control unit 17 or PC 4, this 
can be done by control unit 18 or PC 13 after receiving the 
respective control signal. In this case, a standard answer 
message would be generated such that, for example, the 
transmitted name of the called party is inserted. For 25 
cxample,lhe message may state the following: "The number 
you called 'Mister X* is not available, please leave a 
message." After sending the control signal, the connection 
between the two telephone sets 1 and 3 can be terminated. 
If only the control signal is sent, the connection time will be 30 
very short, and therefore only a minimum of telephone costs 
will be incurred. This is advantageous particularly with long 
distance calls. 

Hereinafter, a connection is established between tele- 
phone set 1 and PC 13 to transfer a message to PC 13 {FIG. 35 
5, step 32). Therefore, telephone set 1 is provided with a 
analog or digital interface. PC 13 can comprise a system, 
such as shown in FIG. 2. PC 13 converts the audio message 
into digital data (FIG. 5, step 33) which will be stored in its 
memory 9. Speech recognition system 15 or 10 then con- 40 
verts the digitized audio message into a text file. This text 
file is then sent to PC 4 (FIG. 5, step 34) which is associated 
with telephone set 3 of the called user B. Finally, the text file 
is stored permanently, for example, in a data base or message 
handling system, such as an e-mail system. 45 

Another embodiment of the present invention is shown in 
FIG. 4 with the respective flow charts in FIG. 6 and FIG. 7. 
FIG. 4 is similar to FIG. 3. For example, PC 4 is connected 
to telephone set 3, but a possible connection exists between JQ 
PC 13 and telephone set I. This possible connection is again 
indicated by dotted lines. The speech recognition systems 
IS, 16 and 5, 6 are again speaker dependent. The data bases 
16 and 6 contain parameters which are speaker dependent 
and necessary for running the respective speech recognition 5J 
program. These parameters are created when individual 
users set-up the respective systems. 

If a call from telephone set 1 to telephone set 3 is made 
(FIG. 6, step 40; FIG. 7, step 50), control unit 19 generates 
an answering message as described above. The connection ^ 
between the two telephone sets 1 and 3 is hereinafter upheld, 
while the audio message is transferred through network 2 to 
PC 4 where it will be converted into digital data and stored 
as described above (FIG. 6, step 41; FIG. 7, step 51). In a 
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first mode, PC 4 sends this digital data via the data com- 
munication network 14 to PC 13 (FIG. 6, step 42) where it 
will be converted into a text tile (FIG. 6, step 43) by means 
of the speaker dependent speech recognition system 15, 16 
and then transmitted back to PC 4 (FIG. 6. step 44) where 
it will be stored permanently. In a second mode, PC 4 
requests the digital data from speech recognition data base 
16 (FIG. 7, step 52) of the associated speech recognition 
system 15. After the digital data of the database 16 is 
received and stored in database 6 (FIG. 7, step 53), PC 4 
converts the audio message into a text file (FIG. 7, step 54). 
This text file can be handled as described above. 

In addition, in a further embodiment, the receiving side 
can comprise a compare unit 19 which is connected to PC 4. 
The function of compare unit 19 can also be provided by PC 
4. With compare unit 19, it is possible to manage both 
above -described modes automatically. Therefore, compare 
unit 19 compares the size of the digital data with the size of 
speech recognition data base 16. The size of the speech 
recognition data base might be predetermined or PC 13 can 
provide PC 4 with this information. Whichever is smaller 
will be transferred to the other PC 4 or 13 through data 
communication network 14. The conversion into a text file 
is then done either by speech recognition system 15 with the 
transmitted digital data or by the speech recognition system 
5 loaded with the transmitted individual data base 16. 
Transmitting of the text file, if necessary, and storing of the 
text file will be completed as described above. This method 
has the advantage of only using the minimum data commu- 
nication network time. 

The above described applications are not necessarily 
limited to the function of an automatic answering machine. 
Such a system may be incorporated in any telecommunica- 
tion device, so any user can activate the system to convert 
an audio message into a text file. For example, a user can 
activate the system at any time during a call to save 
important parts of a conversation into a text file. 

We claim: 

1 . A system for automatically storing a message compris- 
ing: a first telecommunication device for transmitting and/or 
receiving an audio message to a second telecommunication 
device, both devices being coupled through a first network 
for transmitting and receiving telephone calls, a first data 
processing system with a speaker dependent data base and a 
second data processing system, both being coupled through 
a second network for data communication, said first data 
processing system being coupled with said first telecommu- 
nication device and said second data processing system 
being coupled with said second telecommunication device, 
at least said first data processing system having a speech 
recognition system, said second telecommunication device 
having a control unit which generates a signal after receiving 
said audio message from said first telecommunication device 
and having a compare unit, upon generating said signal said 
second data processing system converting said audio mes- 
sage into digital data and said compare unit comparing the 
size of said digital data with the size of the data base, 
whichever is smaller being sent to the other data processing 
system, which converts the digital data into a text file. 

2. A system for automatically storing an audio message 
according to claim 1, wherein at least one of said telephone 
sets is formed within one of said data processing systems. 
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